EDA Project¶

Project Proposal¶

The dataset contains documented crime incidents in Chicago from 2001 to the present and from the Chicago Police Department's CLEAR system. https://catalog.data.gov/dataset/crimes-2001-to-present

My family and I are moving to Chicago, and I would like to be aware of when crime increases and decreases and which crimes are the most common to make the most suitable judgment to keep me and my family safe. I am also curious to discover what crime will look like in the next five years based on patterns from the dataset.

The fields that will be most helpful:

  • location
  • year
  • primary_type

I aim to learn where crime is most evident, the ratio of crimes to arrests, and the correlation between each location and crime type. Learning this information will advise me and my family on where to relocate.

In [1]:
#Load up modules
import pandas as pd
import numpy as np
#set up notebook to display multiple output in one cell
from IPython.core.interactiveshell  import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
In [2]:
crimes = pd.read_csv('crimes.csv',sep= ',')
In [3]:
crimes.info()
crimes.head()
crimes.tail()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65499 entries, 0 to 65498
Data columns (total 22 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   ID                    65499 non-null  int64  
 1   Case Number           65499 non-null  object 
 2   Date                  65499 non-null  object 
 3   Block                 65499 non-null  object 
 4   IUCR                  65499 non-null  object 
 5   Primary Type          65499 non-null  object 
 6   Description           65499 non-null  object 
 7   Location Description  65084 non-null  object 
 8   Arrest                65499 non-null  bool   
 9   Domestic              65499 non-null  bool   
 10  Beat                  65499 non-null  int64  
 11  District              65499 non-null  int64  
 12  Ward                  65458 non-null  float64
 13  Community Area        65463 non-null  float64
 14  FBI Code              65499 non-null  object 
 15  X Coordinate          64762 non-null  float64
 16  Y Coordinate          64762 non-null  float64
 17  Year                  65499 non-null  int64  
 18  Updated On            65499 non-null  object 
 19  Latitude              64762 non-null  float64
 20  Longitude             64762 non-null  float64
 21  Location              64762 non-null  object 
dtypes: bool(2), float64(6), int64(4), object(10)
memory usage: 10.1+ MB
Out[3]:
ID Case Number Date Block IUCR Primary Type Description Location Description Arrest Domestic ... Ward Community Area FBI Code X Coordinate Y Coordinate Year Updated On Latitude Longitude Location
0 5741943 HN549294 08/25/2007 09:22:18 AM 074XX N ROGERS AVE 560 ASSAULT SIMPLE OTHER False False ... 49.0 1.0 08A NaN NaN 2007 08/17/2015 03:03:40 PM NaN NaN NaN
1 25953 JE240540 05/24/2021 03:06:00 PM 020XX N LARAMIE AVE 110 HOMICIDE FIRST DEGREE MURDER STREET True False ... 36.0 19.0 01A 1141387.0 1913179.0 2021 11/18/2023 03:39:49 PM 41.917838 -87.755969 (41.917838056, -87.755968972)
2 26038 JE279849 06/26/2021 09:24:00 AM 062XX N MC CORMICK RD 110 HOMICIDE FIRST DEGREE MURDER PARKING LOT True False ... 50.0 13.0 01A 1152781.0 1941458.0 2021 11/18/2023 03:39:49 PM 41.995219 -87.713355 (41.995219444, -87.713354912)
3 13279676 JG507211 11/09/2023 07:30:00 AM 019XX W BYRON ST 620 BURGLARY UNLAWFUL ENTRY APARTMENT False False ... 47.0 5.0 5 1162518.0 1925906.0 2023 11/18/2023 03:39:49 PM 41.952345 -87.677975 (41.952345086, -87.677975059)
4 13274752 JG501049 11/12/2023 07:59:00 AM 086XX S COTTAGE GROVE AVE 454 BATTERY AGGRAVATED P.O. - HANDS, FISTS, FEET, NO / MIN... SMALL RETAIL STORE True False ... 6.0 44.0 08B 1183071.0 1847869.0 2023 12/09/2023 03:41:24 PM 41.737751 -87.604856 (41.737750767, -87.604855911)

5 rows × 22 columns

Out[3]:
ID Case Number Date Block IUCR Primary Type Description Location Description Arrest Domestic ... Ward Community Area FBI Code X Coordinate Y Coordinate Year Updated On Latitude Longitude Location
65494 13260483 JG484114 10/30/2023 03:15:00 AM 003XX W 42ND PL 910 MOTOR VEHICLE THEFT AUTOMOBILE STREET True False ... 20.0 37.0 7 1174729.0 1876779.0 2023 11/07/2023 03:41:07 PM 41.817273 -87.634558 (41.817272712, -87.634557976)
65495 13260346 JG483955 10/30/2023 01:25:00 AM 071XX S DR MARTIN LUTHER KING JR DR 1310 CRIMINAL DAMAGE TO PROPERTY APARTMENT False False ... 6.0 69.0 14 1180150.0 1857738.0 2023 11/07/2023 03:41:07 PM 41.764900 -87.615256 (41.764899756, -87.615255831)
65496 13260937 JG484816 10/30/2023 03:30:00 PM 112XX S HOMEWOOD AVE 460 BATTERY SIMPLE RESIDENCE False False ... 19.0 75.0 08B 1165410.0 1830013.0 2023 11/07/2023 03:41:07 PM 41.689143 -87.670065 (41.689143038, -87.670065135)
65497 13261032 JG484978 10/30/2023 07:00:00 AM 003XX E ERIE ST 910 MOTOR VEHICLE THEFT AUTOMOBILE STREET False False ... 2.0 8.0 7 1178676.0 1904855.0 2023 11/07/2023 03:41:07 PM 41.894226 -87.619223 (41.894226067, -87.619222865)
65498 13260611 JG484346 10/30/2023 10:35:00 AM 052XX N BROADWAY 560 ASSAULT SIMPLE SMALL RETAIL STORE True False ... 48.0 77.0 08A 1167370.0 1934861.0 2023 11/07/2023 03:41:07 PM 41.976815 -87.659880 (41.976814727, -87.659880317)

5 rows × 22 columns

EDA Phase 1¶

  1. I hope to learn about which crimes are most common. I plan to break down each crime and organize them into years and exact locations.
  2. I have a hunch that burglary will be the most common crime, based on what I have observed in the news over the years. I feel there is a big spread of word about crime in Chicago, although, in my experience, it is never specified by word of mouth. I also have a hunch that burglary will be the highest reported within the most recent five years. I also have a hunch that within the next five years, burglary will still be the highest crime based on patterns.
  1. The location of these crimes are reported in the City of Chicago from 2001 to present.
  2. The total sample size before cleaning data is 65,499 reported crimes.
  3. There is nothing particular about the data.
In [4]:
crimes = pd.read_csv('crimes.csv')
crimes.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65499 entries, 0 to 65498
Data columns (total 22 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   ID                    65499 non-null  int64  
 1   Case Number           65499 non-null  object 
 2   Date                  65499 non-null  object 
 3   Block                 65499 non-null  object 
 4   IUCR                  65499 non-null  object 
 5   Primary Type          65499 non-null  object 
 6   Description           65499 non-null  object 
 7   Location Description  65084 non-null  object 
 8   Arrest                65499 non-null  bool   
 9   Domestic              65499 non-null  bool   
 10  Beat                  65499 non-null  int64  
 11  District              65499 non-null  int64  
 12  Ward                  65458 non-null  float64
 13  Community Area        65463 non-null  float64
 14  FBI Code              65499 non-null  object 
 15  X Coordinate          64762 non-null  float64
 16  Y Coordinate          64762 non-null  float64
 17  Year                  65499 non-null  int64  
 18  Updated On            65499 non-null  object 
 19  Latitude              64762 non-null  float64
 20  Longitude             64762 non-null  float64
 21  Location              64762 non-null  object 
dtypes: bool(2), float64(6), int64(4), object(10)
memory usage: 10.1+ MB

Initial Observations¶

  • ID, Case number, Date, Block, IUCR, Primary type, and Description have both an equal amount and the most nulls.
  • Someof the data is not needed for my goals, so I will need to eliminate 14 columns.
In [5]:
# number of unique crimes
crimes['Primary Type'].unique()
#Show the top 5 unique crimes
crimes['Primary Type'].value_counts()[:5]
Out[5]:
array(['ASSAULT', 'HOMICIDE', 'BURGLARY', 'BATTERY', 'THEFT',
       'CRIMINAL DAMAGE', 'DECEPTIVE PRACTICE', 'MOTOR VEHICLE THEFT',
       'CRIMINAL SEXUAL ASSAULT', 'OFFENSE INVOLVING CHILDREN', 'ROBBERY',
       'OTHER OFFENSE', 'SEX OFFENSE', 'WEAPONS VIOLATION', 'STALKING',
       'OBSCENITY', 'CRIMINAL TRESPASS', 'PROSTITUTION', 'ARSON',
       'NARCOTICS', 'KIDNAPPING', 'CONCEALED CARRY LICENSE VIOLATION',
       'INTERFERENCE WITH PUBLIC OFFICER', 'PUBLIC PEACE VIOLATION',
       'LIQUOR LAW VIOLATION', 'NON-CRIMINAL', 'INTIMIDATION',
       'HUMAN TRAFFICKING', 'GAMBLING', 'OTHER NARCOTIC VIOLATION',
       'CRIM SEXUAL ASSAULT'], dtype=object)
Out[5]:
Primary Type
THEFT                  14831
BATTERY                11094
CRIMINAL DAMAGE         7104
MOTOR VEHICLE THEFT     6243
ASSAULT                 5801
Name: count, dtype: int64
In [6]:
# number of unique crime descriptions
crimes['Description'].unique()
#Show the top 5 unique crime descriptions
crimes['Description'].value_counts()[:5]
Out[6]:
array(['SIMPLE', 'FIRST DEGREE MURDER', 'UNLAWFUL ENTRY',
       'AGGRAVATED P.O. - HANDS, FISTS, FEET, NO / MINOR INJURY',
       '$500 AND UNDER', 'TO VEHICLE', 'THEFT BY LESSEE, MOTOR VEHICLE',
       'DOMESTIC BATTERY SIMPLE', 'AUTOMOBILE',
       'AGG. DOMESTIC BATTERY - HANDS, FISTS, FEET, SERIOUS INJURY',
       'FORGERY', 'FINANCIAL IDENTITY THEFT OVER $ 300',
       'ILLEGAL USE CASH CARD', 'NON-AGGRAVATED', 'TO PROPERTY',
       'OVER $500', 'FROM BUILDING', 'BOGUS CHECK', 'CHILD ABDUCTION',
       'ATTEMPT - FINANCIAL IDENTITY THEFT', 'ARMED - HANDGUN',
       'PREDATORY', 'VEHICULAR HIJACKING',
       'SEXUAL ASSAULT OF CHILD BY FAMILY MEMBER',
       'HARASSMENT BY ELECTRONIC MEANS',
       'AGGRAVATED CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER',
       'SEXUAL EXPLOITATION OF A CHILD',
       'AGGRAVATED CRIMINAL SEXUAL ABUSE', 'RECKLESS FIREARM DISCHARGE',
       'CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER',
       'AGGRAVATED SEXUAL ASSAULT OF CHILD BY FAMILY MEMBER',
       'CYBERSTALKING', 'AGGRAVATED - OTHER', 'RETAIL THEFT',
       'SALE / DISTRIBUTE OBSCENE MATERIAL TO MINOR', 'TO STATE SUP LAND',
       'THEFT OF LOST / MISLAID PROPERTY', 'TO LAND',
       'ATTEMPT - AUTOMOBILE',
       'AGGRAVATED DOMESTIC BATTERY - OTHER DANGEROUS WEAPON',
       'OTHER VEHICLE OFFENSE', 'FALSE / STOLEN / ALTERED TRP',
       'AGGRAVATED - KNIFE / CUTTING INSTRUMENT',
       'VEHICLE TITLE / REGISTRATION OFFENSE', 'CHILD ABUSE',
       'CREDIT CARD FRAUD', 'FINANCIAL IDENTITY THEFT $300 AND UNDER',
       'FRAUD OR CONFIDENCE GAME', 'AGGRAVATED - HANDGUN',
       'ARMED - OTHER FIREARM', 'AGGRAVATED VEHICULAR HIJACKING',
       'ATTEMPT FORCIBLE ENTRY', 'FORCIBLE ENTRY',
       'VIOLATE ORDER OF PROTECTION', 'UNLAWFUL POSSESSION - HANDGUN',
       'HARASSMENT BY TELEPHONE', 'THEFT FROM MOTOR VEHICLE',
       'STRONG ARM - NO WEAPON', 'SOLICITING FOR BUSINESS', 'BY FIRE',
       'TO STATE SUPPORTED PROPERTY', 'TELEPHONE THREAT',
       'POSSESS - CRACK', 'TO RESIDENCE', 'THEFT / RECOVERY - AUTOMOBILE',
       'ENDANGERING LIFE / HEALTH OF CHILD',
       'AGGRAVATED - OTHER DANGEROUS WEAPON', 'UNLAWFUL RESTRAINT',
       'AGGRAVATED', 'ARMED - OTHER DANGEROUS WEAPON',
       'CRIMINAL SEXUAL ABUSE', 'PROHIBITED PLACES', 'OTHER OFFENSE',
       'COUNTERFEITING DOCUMENT', 'TO CITY OF CHICAGO PROPERTY',
       'THEFT OF LABOR / SERVICES', 'POSSESS - HEROIN (WHITE)',
       'RESIST / OBSTRUCT / DISARM OFFICER',
       'NON-CONSENSUAL DISSEMINATION OF PRIVATE SEXUAL IMAGES',
       'BOMB THREAT',
       'FINANCIAL EXPLOITATION OF AN ELDERLY OR DISABLED PERSON',
       'AGGRAVATED - HANDS, FISTS, FEET, SERIOUS INJURY',
       'OTHER VIOLATION', 'CYCLE, SCOOTER, BIKE WITH VIN',
       'ATTEMPT ARMED - HANDGUN', 'POCKET-PICKING',
       'PROTECTED EMPLOYEE - HANDS, FISTS, FEET, NO / MINOR INJURY',
       'UNLAWFUL USE - OTHER FIREARM',
       'AGG. PROTECTED EMPLOYEE - HANDS, FISTS, FEET, SERIOUS INJURY',
       'POSSESS - BARBITURATES', 'OTHER CRIME AGAINST PERSON',
       'ATTEMPT AGGRAVATED', 'HOME INVASION',
       'ILLEGAL POSSESSION CASH CARD', 'LICENSE VIOLATION',
       'STOLEN PROPERTY BUY / RECEIVE / POSSESS',
       'POSSESS - CANNABIS MORE THAN 30 GRAMS',
       'ATTEMPT - CYCLE, SCOOTER, BIKE WITH VIN',
       'SOLICIT NARCOTICS ON PUBLIC WAY', 'COMPUTER FRAUD',
       'POSSESS - COCAINE', 'POSSESS - SYNTHETIC DRUGS',
       'VIOLATION OF STALKING NO CONTACT ORDER',
       'AGGRAVATED FINANCIAL IDENTITY THEFT',
       'UNLAWFUL USE - OTHER DANGEROUS WEAPON',
       'MANUFACTURE / DELIVER - CANNABIS OVER 10 GRAMS',
       'AGGRAVATED OF A SENIOR CITIZEN', 'POSSESS - AMPHETAMINES',
       'AGGRAVATED DOMESTIC BATTERY - KNIFE / CUTTING INSTRUMENT',
       'ARMED - KNIFE / CUTTING INSTRUMENT', 'FALSE POLICE REPORT',
       'AGGRAVATED POLICE OFFICER - OTHER DANGEROUS WEAPON',
       'OBSTRUCTING IDENTIFICATION', 'RECKLESS CONDUCT',
       'BURGLARY FROM MOTOR VEHICLE',
       'AGGRAVATED PROTECTED EMPLOYEE - OTHER DANGEROUS WEAPON',
       'AGGRAVATED - HANDS, FISTS, FEET, NO / MINOR INJURY',
       'FORFEIT PROPERTY', 'AGGRAVATED - OTHER FIREARM',
       'ATTEMPT AGGRAVATED CRIMINAL SEXUAL ABUSE',
       'ATTEMPT STRONG ARM - NO WEAPON', 'ATTEMPT ARSON',
       'LIQUOR LICENSE VIOLATION', 'CRIMINAL DEFACEMENT',
       'GUN OFFENDER - ANNUAL REGISTRATION', 'EMBEZZLEMENT',
       'CHILD PORNOGRAPHY', 'ATTEMPT THEFT', 'COUNTERFEIT CHECK', 'OTHER',
       'PUBLIC INDECENCY', 'FOUND SUSPECT NARCOTICS',
       'AGGRAVATED DOMESTIC BATTERY - HANDGUN', 'OBSTRUCTING SERVICE',
       'GUN OFFENDER - DUTY TO REGISTER', 'POSSESSION OF BURGLARY TOOLS',
       'ATTEMPT ARMED - KNIFE / CUTTING INSTRUMENT', 'PURSE-SNATCHING',
       'VIOLATION GPS MONITORING DEVICE', 'AGGRAVATED OF A CHILD',
       'POSSESS - CANNABIS 30 GRAMS OR LESS', 'ANIMAL ABUSE / NEGLECT',
       'OBSCENE TELEPHONE CALLS',
       'THEFT / RECOVERY - CYCLE, SCOOTER, BIKE WITH VIN',
       'AGGRAVATED POLICE OFFICER - HANDGUN',
       'OTHER CRIME INVOLVING PROPERTY', 'KIDNAPPING',
       'OBSTRUCTING JUSTICE', 'CONCEALED CARRY LICENSE REVOCATION',
       'ARMED WHILE UNDER THE INFLUENCE', 'EMPLOY MINOR',
       'UNLAWFUL POSSESSION - AMMUNITION', 'POSSESSION OF DRUG EQUIPMENT',
       'AGGRAVATED POLICE OFFICER - HANDS, FISTS, FEET, NO INJURY',
       'PEEPING TOM', 'OF AN UNBORN CHILD',
       'SELL / GIVE / DELIVER LIQUOR TO MINOR', 'INTIMIDATION',
       'POSSESS - HALLUCINOGENS', 'POSSESS - PCP',
       'CHILD ABDUCTION / STRANGER', 'DECEPTIVE COLLECTION PRACTICES',
       'STATE BENEFITS FRAUD', 'TRUCK, BUS, MOTOR HOME',
       'MANUFACTURE / DELIVER - CRACK',
       'VIOLENT OFFENDER - ANNUAL REGISTRATION',
       'POST GRAPHIC INFO PORGNOGRAPHIC INTERNET OR POSS GRAPHIC INF',
       'ALTER / FORGE PRESCRIPTION', 'UNLAWFUL USE - HANDGUN',
       'AGGRAVATED POLICE OFFICER - KNIFE / CUTTING INSTRUMENT',
       'AGGRAVATED COMPUTER TAMPERING',
       'CONTRIBUTE TO THE DELINQUENCY OF CHILD', 'ARMED: HANDGUN',
       'INDECENT SOLICITATION OF A CHILD',
       'AGGRAVATED P.O. - HANDS, FISTS, FEET, SERIOUS INJURY',
       'SEX OFFENDER - FAIL TO REGISTER',
       'VIOLATION OF CIVIL NO CONTACT ORDER', 'INVOLUNTARY SERVITUDE',
       'SOLICIT ON PUBLIC WAY', 'GAME/DICE', 'ATTEMPT NON-AGGRAVATED',
       'AGGRAVATED PROTECTED EMPLOYEE - KNIFE / CUTTING INSTRUMENT',
       'UNAUTHORIZED VIDEOTAPING',
       'MANUFACTURE / DELIVER -  HEROIN (WHITE)',
       'THEFT / RECOVERY - TRUCK, BUS, MOBILE HOME',
       'POSSESS - HYPODERMIC NEEDLE', 'COMMERCIAL SEX ACTS',
       'CYCLE, SCOOTER, BIKE NO VIN',
       'VIOLENT OFFENDER - DUTY TO REGISTER',
       'UNLAWFUL USE OF A COMPUTER', 'CHILD ABANDONMENT',
       'POSSESS - METHAMPHETAMINE', 'ESCAPE', 'ARMED VIOLENCE',
       'OTHER WEAPONS VIOLATION', 'ARSON THREAT', 'OBSCENE MATTER',
       'SEX OFFENDER - FAIL TO REGISTER NEW ADDRESS',
       'ATTEMPT ARMED - OTHER DANGEROUS WEAPON', 'EXTORTION',
       'MANUFACTURE / DELIVER - PCP',
       'POSSESS - HEROIN (TAN / BROWN TAR)', 'POSS: COCAINE',
       'UNLAWFUL SALE - HANDGUN',
       'MANUFACTURE / DELIVER - CANNABIS 10 GRAMS OR LESS',
       'EAVESDROPPING', 'MANUFACTURE / DELIVER - BARBITURATES',
       'AGGRAVATED PROTECTED EMPLOYEE - HANDGUN', 'IMPERSONATION',
       'POSS: HEROIN(WHITE)', 'POSS: CRACK',
       'UNLAWFUL VISITATION INTERFERENCE',
       'ATTEMPT AGGRAVATED - KNIFE / CUTTING INSTRUMENT',
       'INSURANCE FRAUD',
       'GUN OFFENDER - DUTY TO REPORT CHANGE OF INFORMATION',
       'FROM COIN-OPERATED MACHINE OR DEVICE', 'SOLICIT OFF PUBLIC WAY',
       'UNLAWFUL POSSESSION - OTHER FIREARM',
       'POSSESS FIREARM / AMMUNITION - NO FOID CARD',
       'THEFT BY LESSEE, NON-MOTOR VEHICLE', 'DELIVERY CONTAINER THEFT',
       'ATTEMPT AGGRAVATED - OTHER', 'TAMPER WITH MOTOR VEHICLE',
       'SOLICITATION OF A SEXUAL ACT', 'BOARD PLANE WITH WEAPON',
       'PUBLIC AID WIRE/MAIL FRAUD - VIA MAIL/PACKAGE/DELIVERY SYS',
       'MANUFACTURE / DELIVER - METHAMPHETAMINE', 'TO AIRPORT',
       'ATTEMPT ARMED - OTHER FIREARM',
       'TIRE DEFLATION DEVICE DEPLOYMENT', 'PAROLE VIOLATION',
       'AGGRAVATED: HANDGUN', 'UNLAWFUL USE / SALE OF AIR RIFLE',
       'POSSESSION OF PORNOGRAPHIC PRINT',
       'VIOLATION OF BAIL BOND - DOMESTIC VIOLENCE',
       'POSSESSION - EXPLOSIVE / INCENDIARY DEVICE',
       'CRIMINAL DRUG CONSPIRACY', 'WIC FRAUD',
       'VIOLATION OF SUMMARY CLOSURE', 'ATTEMPT - TRUCK, BUS, MOTOR HOME',
       'THEFT / RECOVERY - CYCLE, SCOOTER, BIKE NO VIN', 'CALL OPERATION',
       'MANUFACTURE / DELIVER - SYNTHETIC DRUGS',
       'SEX OFFENDER - PROHIBITED ZONE', 'INTERFERENCE JUDICIAL PROCESS',
       'ATTEMPT CRIMINAL SEXUAL ABUSE', 'SEXUAL RELATIONS IN FAMILY',
       'MANUFACTURE / DELIVER - COCAINE', 'INSTITUTIONAL VANDALISM',
       'AGG CRIM SEX ABUSE - VIC 13-16 YOA - OFF 5 YR OLDER PENETRAT',
       'MANUFACTURE / DELIVER - HEROIN (TAN / BROWN TAR)',
       'ILLEGAL POSSESSION BY MINOR', 'STRONGARM - NO WEAPON',
       'CRIMINAL SEXUAL ABUSE - SEXUAL PENETRATION',
       'HAZARDOUS MATERIALS VIOLATION', 'ATTEMPT AGGRAVATED - HANDGUN',
       'OTHER PROSTITUTION OFFENSE', 'ABUSE / NEGLECT - CARE FACILITY',
       'MANU/DELIVER:CRACK', 'CANNABIS PLANT', 'MOB ACTION',
       'TO FIRE FIGHT.APP.EQUIP', 'OBSCENITY', 'RECKLESS HOMICIDE',
       'MANUFACTURE / DELIVER - HALLUCINOGEN', 'INTOXICATING COMPOUNDS',
       'FORCIBLE DETENTION', 'ATT: AUTOMOBILE', '$300 AND UNDER',
       'ATTEMPT - CYCLE, SCOOTER, BIKE NO VIN',
       'OTHER ARSON / EXPLOSIVE INCIDENT', 'PUBLIC DEMONSTRATION',
       'BIGAMY', 'AGGRAVATED PROTECTED EMPLOYEE - OTHER FIREARM',
       'VIOLENT OFFENDER - FAIL TO REGISTER NEW ADDRESS',
       'AGGRAVATED POLICE OFFICER - OTHER FIREARM',
       'ATTEMPT POSSESSION CANNABIS',
       'MANUFACTURE / DELIVER - AMPHETAMINES',
       'MANUFACTURE / DELIVER - SYNTHETIC MARIJUANA'], dtype=object)
Out[6]:
Description
SIMPLE                     7785
OVER $500                  5093
$500 AND UNDER             4721
DOMESTIC BATTERY SIMPLE    4656
AUTOMOBILE                 4242
Name: count, dtype: int64
In [7]:
# number of unique locations
crimes['Location'].unique()
#Show the top 5 unique locations for crimes
crimes['Location'].value_counts()[:5]
Out[7]:
array([nan, '(41.917838056, -87.755968972)',
       '(41.995219444, -87.713354912)', ...,
       '(41.87253933, -87.640895762)', '(41.817272712, -87.634557976)',
       '(41.894226067, -87.619222865)'], dtype=object)
Out[7]:
Location
(41.883500187, -87.627876698)    81
(41.868541914, -87.639235361)    80
(41.788987036, -87.74147999)     66
(41.867428687, -87.626342565)    60
(41.963070794, -87.655984213)    59
Name: count, dtype: int64
In [8]:
# number of unique location descriptions
crimes['Location Description'].unique()
#Show the top 5 unique location descriptions for crimes
crimes['Location Description'].value_counts()[:5]
Out[8]:
array(['OTHER', 'STREET', 'PARKING LOT', 'APARTMENT',
       'SMALL RETAIL STORE', 'GAS STATION',
       'PARKING LOT / GARAGE (NON RESIDENTIAL)',
       'AIRPORT EXTERIOR - NON-SECURE AREA', nan, 'DAY CARE CENTER',
       'CREDIT UNION', 'RESIDENCE - GARAGE',
       'RESIDENCE - PORCH / HALLWAY', 'CURRENCY EXCHANGE', 'RESIDENCE',
       'AUTO / BOAT / RV DEALERSHIP',
       'POLICE FACILITY / VEHICLE PARKING LOT', 'DEPARTMENT STORE',
       'CHA PARKING LOT / GROUNDS', 'RESTAURANT', 'GROCERY FOOD STORE',
       'APPLIANCE STORE', 'OTHER (SPECIFY)',
       'RESIDENCE - YARD (FRONT / BACK)', 'ALLEY', 'SIDEWALK',
       'VEHICLE NON-COMMERCIAL', 'VACANT LOT / LAND', 'BAR OR TAVERN',
       'CAR WASH', 'HOSPITAL BUILDING / GROUNDS',
       'COMMERCIAL / BUSINESS OFFICE', 'DRIVEWAY - RESIDENTIAL',
       'PARK PROPERTY', 'BANK', 'DRUG STORE',
       'LAKEFRONT / WATERFRONT / RIVERBANK', 'SCHOOL - PUBLIC BUILDING',
       'AIRPORT TERMINAL LOWER LEVEL - SECURE AREA',
       'NURSING / RETIREMENT HOME', 'HOTEL / MOTEL', 'CONVENIENCE STORE',
       'CTA BUS STOP', 'AIRPORT TERMINAL UPPER LEVEL - NON-SECURE AREA',
       'GOVERNMENT BUILDING / PROPERTY', 'TAVERN / LIQUOR STORE',
       'CTA PLATFORM', 'COLLEGE / UNIVERSITY - RESIDENCE HALL',
       'AIRPORT TERMINAL LOWER LEVEL - NON-SECURE AREA',
       'VEHICLE - COMMERCIAL', 'SCHOOL - PUBLIC GROUNDS', 'WAREHOUSE',
       'CTA TRAIN', 'CTA BUS', 'ATM (AUTOMATIC TELLER MACHINE)',
       'AIRPORT TERMINAL UPPER LEVEL - SECURE AREA', 'CONSTRUCTION SITE',
       'AIRPORT BUILDING NON-TERMINAL - NON-SECURE AREA', 'ATHLETIC CLUB',
       'CHURCH / SYNAGOGUE / PLACE OF WORSHIP', 'CTA STATION',
       'CHA APARTMENT', 'CEMETARY', 'ABANDONED BUILDING',
       'CHA HALLWAY / STAIRWELL / ELEVATOR',
       'OTHER RAILROAD PROPERTY / TRAIN DEPOT', 'CHA GROUNDS', 'LIBRARY',
       'BOAT / WATERCRAFT',
       'VEHICLE - OTHER RIDE SHARE SERVICE (LYFT, UBER, ETC.)',
       'AIRPORT BUILDING NON-TERMINAL - SECURE AREA',
       'SPORTS ARENA / STADIUM', 'AIRCRAFT',
       'CTA PARKING LOT / GARAGE / OTHER PROPERTY', 'BARBERSHOP',
       'SCHOOL - PRIVATE GROUNDS', 'CTA TRACKS - RIGHT OF WAY',
       'GAS STATION DRIVE/PROP.', 'TAXICAB', 'ANIMAL HOSPITAL',
       'SCHOOL - PRIVATE BUILDING', 'MEDICAL / DENTAL OFFICE',
       'OTHER COMMERCIAL TRANSPORTATION', 'AIRPORT PARKING LOT',
       'CASINO/GAMBLING ESTABLISHMENT', 'MOVIE HOUSE / THEATER',
       'CLEANING STORE', 'POOL ROOM', 'FACTORY / MANUFACTURING BUILDING',
       'COLLEGE / UNIVERSITY - GROUNDS', 'AIRPORT EXTERIOR - SECURE AREA',
       'HIGHWAY / EXPRESSWAY', 'FEDERAL BUILDING', 'HOUSE', 'PAWN SHOP',
       'FIRE STATION', 'ROOMING HOUSE', 'VEHICLE - DELIVERY TRUCK',
       'HALLWAY', 'AUTO', 'COIN OPERATED MACHINE',
       'AIRPORT TRANSPORTATION SYSTEM (ATS)', 'JAIL / LOCK-UP FACILITY',
       'BRIDGE', 'PORCH', 'AIRPORT VENDING ESTABLISHMENT',
       'AIRPORT/AIRCRAFT', 'GARAGE', 'KENNEL', 'YARD', 'FOREST PRESERVE',
       'BOWLING ALLEY', 'AIRPORT TERMINAL MEZZANINE - NON-SECURE AREA',
       'VEHICLE - COMMERCIAL: ENTERTAINMENT / PARTY BUS', 'GANGWAY',
       'RETAIL STORE', 'VACANT LOT', 'CHA HALLWAY'], dtype=object)
Out[8]:
Location Description
STREET                                    18618
APARTMENT                                 11726
RESIDENCE                                  7805
SIDEWALK                                   3811
PARKING LOT / GARAGE (NON RESIDENTIAL)     2300
Name: count, dtype: int64
In [9]:
# unique years of crimes 
crimes['Year'].unique()
#show the top five most common years of crimes to take place
crimes['Year'].value_counts()[:5]
Out[9]:
array([2007, 2021, 2023, 2002, 2024, 2022, 2019, 2020, 2011, 2015, 2014,
       2018, 2010, 2013, 2004, 2017, 2008, 2016, 2005, 2001, 2012, 2009,
       2003, 2006])
Out[9]:
Year
2023    43476
2024    20628
2022      402
2021      244
2020      101
Name: count, dtype: int64
In [10]:
# unique number of arrests for crimes
crimes['Arrest'].unique()
# show the top five crimes for arrests that were actually made
crimes['Arrest'].value_counts()[:5]
Out[10]:
array([False,  True])
Out[10]:
Arrest
False    57302
True      8197
Name: count, dtype: int64

These fields provide the X and Y coordinates on a projected map. The minimum values of 0 indicate that some entries may be missing or unrecorded in specific locations, as coordinates should generally not be zero.¶

The minimum latitude and longitude values suggest that a few entries may be slightly outside the expected range, possibly as geographic outliers; however, because I have the location and description, that will not be necessary.¶

In [11]:
crimes.describe()
Out[11]:
ID Beat District Ward Community Area X Coordinate Y Coordinate Year Latitude Longitude
count 6.549900e+04 65499.000000 65499.000000 65458.000000 65463.000000 6.476200e+04 6.476200e+04 65499.000000 64762.000000 64762.000000
mean 1.326482e+07 1160.510420 11.374082 23.187357 36.178223 1.165304e+06 1.887743e+06 2023.184476 41.847548 -87.668855
std 9.276374e+05 709.939284 7.093998 13.963394 21.681136 1.686823e+04 3.252139e+04 1.400784 0.089464 0.061182
min 1.906000e+03 111.000000 1.000000 1.000000 1.000000 0.000000e+00 0.000000e+00 2001.000000 36.619446 -91.686566
25% 1.322308e+07 533.000000 5.000000 10.000000 22.000000 1.154142e+06 1.860389e+06 2023.000000 41.772342 -87.709409
50% 1.324707e+07 1034.000000 10.000000 23.000000 32.000000 1.167111e+06 1.894108e+06 2023.000000 41.865145 -87.661980
75% 1.358230e+07 1733.000000 17.000000 34.000000 53.000000 1.176670e+06 1.910796e+06 2024.000000 41.910858 -87.627214
max 1.361962e+07 2535.000000 31.000000 50.000000 77.000000 1.205119e+06 1.951503e+06 2024.000000 42.022549 -87.524542
In [12]:
crimes['Location Description'].describe()
Out[12]:
count      65084
unique       117
top       STREET
freq       18618
Name: Location Description, dtype: object
In [13]:
crimes['Location'].describe()
Out[13]:
count                             64762
unique                            44328
top       (41.883500187, -87.627876698)
freq                                 81
Name: Location, dtype: object
In [14]:
crimes['Primary Type'].describe()
Out[14]:
count     65499
unique       31
top       THEFT
freq      14831
Name: Primary Type, dtype: object
In [15]:
import matplotlib.pyplot as plt
crimes.boxplot(column='Year')
Out[15]:
<Axes: >
No description has been provided for this image

Dropping 16 columns that are not necessary for my analysis¶

In [16]:
col_to_drop = ['ID','Case Number','Date','Block','IUCR','Domestic','Beat','District','Ward','Community Area','FBI Code','X Coordinate','Y Coordinate','Updated On','Latitude','Longitude']
#before drop
crimes.shape
crimes = crimes.drop(columns = col_to_drop, axis = 1, inplace = False)
#after drop
crimes.shape
Out[16]:
(65499, 22)
Out[16]:
(65499, 6)
In [17]:
crimes.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65499 entries, 0 to 65498
Data columns (total 6 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Primary Type          65499 non-null  object
 1   Description           65499 non-null  object
 2   Location Description  65084 non-null  object
 3   Arrest                65499 non-null  bool  
 4   Year                  65499 non-null  int64 
 5   Location              64762 non-null  object
dtypes: bool(1), int64(1), object(4)
memory usage: 2.6+ MB

Checking out remaining nulls¶

In [18]:
crimes['Primary Type'].isnull().sum()
missing_info = crimes[crimes['Primary Type'].isnull()][['Primary Type','Year','Location','Location Description','Arrest']]
missing_info
Out[18]:
0
Out[18]:
Primary Type Year Location Location Description Arrest
In [19]:
crimes.to_csv('crimes_final.csv', header = True, index = False)

EDA Phase 2¶

In [20]:
#Load modules
import pandas as pd
#Load for visuals
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px 

#Setting up notebook to display multiple output in one cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
In [21]:
#Read in file from Phase 1
crimes = pd.read_csv('crimes_final.csv')
#What is the shape of the data
crimes.info()
#Look at first five records
crimes.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65499 entries, 0 to 65498
Data columns (total 6 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Primary Type          65499 non-null  object
 1   Description           65499 non-null  object
 2   Location Description  65084 non-null  object
 3   Arrest                65499 non-null  bool  
 4   Year                  65499 non-null  int64 
 5   Location              64762 non-null  object
dtypes: bool(1), int64(1), object(4)
memory usage: 2.6+ MB
Out[21]:
Primary Type Description Location Description Arrest Year Location
0 ASSAULT SIMPLE OTHER False 2007 NaN
1 HOMICIDE FIRST DEGREE MURDER STREET True 2021 (41.917838056, -87.755968972)
2 HOMICIDE FIRST DEGREE MURDER PARKING LOT True 2021 (41.995219444, -87.713354912)
3 BURGLARY UNLAWFUL ENTRY APARTMENT False 2023 (41.952345086, -87.677975059)
4 BATTERY AGGRAVATED P.O. - HANDS, FISTS, FEET, NO / MIN... SMALL RETAIL STORE True 2023 (41.737750767, -87.604855911)

Using pandas crosstab to count occurrences for each combination of values in the Arrest and Year columns.¶

In [22]:
df50 = pd.DataFrame(pd.crosstab(crimes['Arrest'], crimes['Year']))
df50.head(10)
Out[22]:
Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 ... 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Arrest
False 25 37 41 16 31 5 7 16 21 15 ... 40 34 35 58 69 81 195 306 38232 17946
True 6 9 2 3 1 0 3 2 8 3 ... 11 6 8 8 13 20 49 96 5244 2682

2 rows × 24 columns

The crime numbers have fluctuated, and we can also see a notably higher count of crimes where no arrests were made compared to actual arrests. In 2023, only 5,244 true arrests were conducted to arrests and in 2024, only 2,682 arrests. The overall number of reported crimes has increased continuously since 2015.¶

Creating a heatmap of top 10 crimes¶

In [23]:
top_crimes = crimes['Primary Type'].value_counts().head(10).index
filtered_df = crimes[crimes['Primary Type'].isin(top_crimes)]
heatmap_data = filtered_df.pivot_table(
    index='Primary Type', columns='Year', values='Description', aggfunc='count', fill_value=0
)

# Plot the heatmap without numbers
sns.heatmap(heatmap_data, cmap="Blues")
plt.title("Top 10 Crimes by Year")
plt.ylabel("Crime Type")
plt.xlabel("Year")
plt.show()
Out[23]:
<Axes: xlabel='Year', ylabel='Primary Type'>
Out[23]:
Text(0.5, 1.0, 'Top 10 Crimes by Year')
Out[23]:
Text(50.7222222222222, 0.5, 'Crime Type')
Out[23]:
Text(0.5, 23.52222222222222, 'Year')
No description has been provided for this image

This heatmap indicates that theft was highest in 2023, but it has decreased in 2024.¶

Counting Crimes by location¶

I am using plotly.express to find out where crimes take place the most.¶

In [24]:
locations = crimes['Location Description'].value_counts().head(10).reset_index()
locations.columns = ['Location', 'Count']
fig = px.bar(locations, x='Count', y='Location', orientation='h',
             title="Top Ten Locations",
             labels={'Count': 'Number of Crimes', 'Location': 'Location Description'},
             color='Count', color_continuous_scale='Purples')
fig.show()

After exploring the interactive bar chart, we can note that crimes occur most in the street, where there are 18,618 reported crimes.¶

My next goal is to compare which crimes had no arrest, and which crimes did.¶

In [25]:
# Grouping and counting crimes
arrest_count = crimes.groupby(['Primary Type', 'Arrest']).size().unstack(fill_value=0)

# Naming columns
arrest_count.columns = ['No Arrest', 'Arrest']

# Sort by the 'Arrest' column in descending order
arrest_count = arrest_count.sort_values('Arrest', ascending=False)

#Displaying
arrest_count.head()
Out[25]:
No Arrest Arrest
Primary Type
BATTERY 9330 1764
NARCOTICS 59 1245
WEAPONS VIOLATION 913 1155
THEFT 14012 819
OTHER OFFENSE 3232 697

I am creating a bar chart for arrested versus non-arrested crimes, to look at percentage.¶

In [26]:
#Plotting graph
crimes['Arrest'].value_counts(normalize=True).plot.bar(title='Percentage of Crimes to Arrest')
plt.show()
Out[26]:
<Axes: title={'center': 'Percentage of Crimes to Arrest'}, xlabel='Arrest'>
No description has been provided for this image

Battery appears to be the most reported as well as the top crime where no arrest was made. Narcotics seem to be the most concentrated for arrest.¶

Crime Prevalence by Location Coordinates using a bar chart.¶

In [27]:
crimes['Location'].value_counts().head(10).plot.barh(title='Top 10 Coordinates with Most Crimes')
plt.show()
Out[27]:
<Axes: title={'center': 'Top 10 Coordinates with Most Crimes'}, ylabel='Location'>
No description has been provided for this image

With this information, we now know that the coordinate: (41.883500187, -87.627876698) (100-148 N State St Chicago, IL 60602) appears to have the most crime.¶

Zooming into 2015 to present, so I can have a better visualization of the crimes.¶

In [28]:
#Limiting the years
crime_trends = crimes.groupby(['Year', 'Primary Type']).size().unstack(fill_value=0)
crime_trends = crime_trends[crime_trends.index >= 2015]

#plotting
crime_trends.plot(figsize=(12, 6), colormap='rainbow', linewidth=4)

plt.title("Crime Trends from 2015 to 2024", fontsize=15)
plt.xlabel("Year")
plt.ylabel("Number of Crimes")
plt.legend(title="Crime Type", bbox_to_anchor=(1.01, 1), loc='upper left')
plt.grid(axis='y', alpha=0.7)
plt.show()
Out[28]:
<Axes: xlabel='Year'>
Out[28]:
Text(0.5, 1.0, 'Crime Trends from 2015 to 2024')
Out[28]:
Text(0.5, 0, 'Year')
Out[28]:
Text(0, 0.5, 'Number of Crimes')
Out[28]:
<matplotlib.legend.Legend at 0x12a9f1250>
No description has been provided for this image

Some crimes, such as battery, assault, and burglary, follow consistent patterns or rise frequently. The tremendous growth in recent years could be due to distinctive independent facets; for instance, documenting modifications, population increase, or further societal impacts.¶

In [38]:
crime_counts = crimes['Primary Type'].value_counts()

# Combine smaller categories into "Other"
threshold = 0.02  # Categories below 2% will be grouped
crime_counts = crime_counts[crime_counts / crime_counts.sum() > threshold]
crime_counts['Other'] = crimes['Primary Type'].value_counts().sum() - crime_counts.sum()

plt.figure(figsize=(8, 8))
plt.pie(crime_counts, labels=crime_counts.index, autopct='%1.1f%%', startangle=140, labeldistance=1.2)
plt.title('Proportion of Crime Types')
colors = sns.color_palette('pastel', len(filtered_counts))
plt.show()
Out[38]:
<Figure size 800x800 with 0 Axes>
Out[38]:
([<matplotlib.patches.Wedge at 0x12be51490>,
  <matplotlib.patches.Wedge at 0x12bd18b90>,
  <matplotlib.patches.Wedge at 0x12be51cd0>,
  <matplotlib.patches.Wedge at 0x12be52330>,
  <matplotlib.patches.Wedge at 0x12be52960>,
  <matplotlib.patches.Wedge at 0x12be52f90>,
  <matplotlib.patches.Wedge at 0x12be535c0>,
  <matplotlib.patches.Wedge at 0x12be53b90>,
  <matplotlib.patches.Wedge at 0x12be80230>],
 [Text(-1.1998951076556308, -0.015866021053878753, 'THEFT'),
  Text(-0.370762023608177, -1.141286783350254, 'BATTERY'),
  Text(0.6361422475341697, -1.0175082510241256, 'CRIMINAL DAMAGE'),
  Text(1.1179747186952125, -0.4360418883070298, 'MOTOR VEHICLE THEFT'),
  Text(1.174678007419675, 0.2452174114628524, 'ASSAULT'),
  Text(0.9294078739491969, 0.7590790497973409, 'DECEPTIVE PRACTICE'),
  Text(0.5717890476107171, 1.055015300852281, 'OTHER OFFENSE'),
  Text(0.1854973204267981, 1.1855761232896342, 'ROBBERY'),
  Text(-0.5059461528914141, 1.0881261371616702, 'Other')],
 [Text(-0.5999475538278154, -0.007933010526939377, '22.6%'),
  Text(-0.1853810118040885, -0.570643391675127, '16.9%'),
  Text(0.31807112376708485, -0.5087541255120628, '10.8%'),
  Text(0.5589873593476062, -0.2180209441535149, '9.5%'),
  Text(0.5873390037098375, 0.1226087057314262, '8.9%'),
  Text(0.46470393697459844, 0.37953952489867043, '6.4%'),
  Text(0.28589452380535857, 0.5275076504261405, '6.0%'),
  Text(0.09274866021339905, 0.5927880616448171, '4.9%'),
  Text(-0.25297307644570705, 0.5440630685808351, '13.9%')])
Out[38]:
Text(0.5, 1.0, 'Proportion of Crime Types')
No description has been provided for this image
In [56]:
import plotly.express as px

# Example: Bubble chart for arrests
bubble_data = crimes.groupby(['Year', 'Primary Type']).size().reset_index(name='Count')
fig = px.scatter(bubble_data, x='Year', y='Primary Type', size='Count',
                 color='Primary Type', title='Crime Trends by Arrests')
fig.show()
In [57]:
import pandas as pd

# Group data by location and year
location_year_data = crimes.groupby(['Location Description', 'Year']).size().unstack(fill_value=0)
location_year_data = location_year_data.loc[["STREET", "RESIDENCE", "SIDEWALK", "PARKING LOT", "ALLEY"]]

# Plot stacked bar chart
location_year_data.T.plot(kind='bar', stacked=True, figsize=(12, 8))
plt.title('Crimes by Location Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Crimes')
plt.legend(title='Location Description')
plt.show()
Out[57]:
<Axes: xlabel='Year'>
Out[57]:
Text(0.5, 1.0, 'Crimes by Location Over Time')
Out[57]:
Text(0.5, 0, 'Year')
Out[57]:
Text(0, 0.5, 'Number of Crimes')
Out[57]:
<matplotlib.legend.Legend at 0x12c569760>
No description has been provided for this image

Summary¶

These visualizations have been very helpful in discovering which crime is the most prevalent, which places to avoid moving to, or even being more aware of particular locations and crimes. I thought that the more violent crimes would be a lot higher in the area I was investigating, but I was also shocked to see how many crimes were not under True arrests.¶

In [30]:
#write out final files for EDA 2
crimes.to_csv('crimes_final.csv',header = True, index = False)
In [ ]: